Radial distribution of RNA genome packaged inside spherical viruses 
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The problem of RNA genomes packaged inside spherical viruses is studied. The viral capsid is 
modeled as a hollowed sphere. The attraction between RNA molecules and the inner viral capsid is 
assumed to be non-specific and occurs at the inner capsid surface only. For small capsid attraction, 
it is found that monomer concentration of RNA molecules is maximum at the center of the capsid 
to maximize their configurational entropy. For stronger capsid attraction, RNA concentration peaks 
at some distance near the capsid. In the latter case, the competition between the branching of RNA 
secondary struture and its adsorption to the inner capsid results in the formation of a dense layer 
of RNA near capsid surface. The layer thickness is a slowly varying (logarithmic) function of the 
capsid inner radius. Consequently, for immediate strength of RNA-capsid interaction, the amount of 
RNA packaged inside a virus is proportional to the capsid area (or the number of proteins) instead 
of its volume. The numerical profiles describe reasonably well the experimentally observed RNA 
nucleotide concentration profiles of various viruses. 

PACS numbers: 81.16.Dn, 87.16. A-, 87.19. rm 



Viruses attract broad interests from physics commu- 
nity due to their abihty of spontaneous self assembly. 
Many viruses can be produced both in-vivo and in-vitro 
as highly robust and monodisperse particles. As a re- 
sult, beside biomedical applications, understanding virus 
assembly can also have novel promising applications in 
nanofabrication. At the basic level, viruses consist of 
viral genomes (RNA or DNA molecules) packaged in- 
side a protective protein shell (viral capsid). The struc- 
tures of viral capsids for most viruses are well understood 
from high-resolution experiments using cryoelectron mi- 
croscopy or X-ray analysis P, Q, as well as theoretical 
studies d Single-stranded RNA (ssRNA) viruses 
also package their genome spontanously during assembly. 
Several theoretical studies have demonstrated that the 
interaction between capsid proteins and RNA nucleotide 
basis plays an important role in the RNA packaging pro- 
cess, both energetically and kinetically 0, @, 0, H, 
However, unlike the structural study of viral capsid, there 
is still a lack of general understanding of struture of pack- 
aged RNA. In references different models of RNA 
packaging inside viruses were studied. However, all these 
works treat RNA molecules as Zmear flexible polymers. In 
this letter, we want to address the question of how RNA 
molecules are arranged inside a spherical virus, explic- 
itly taking into acount the branching degree of freedom 
of RNA secondary structure. 

We focus on a particular class of ssRNA viruses 
where the interaction between capsid proteins and RNA 
molecules is non-specific and occurs dominantly at the 
inner surface of the capsid. This is the case for viruses 
where basic amino acids are located on the surface and 
electrostatic interaction is strongly screened in the bulk 
solution (examples of such viruses are bacteriophage 
MS2, Q Beta, Dengue, Immature Yellow Fever,... gen- 
erally viruses belonging to group B and C mentioned in 
Ref. (In some viruses such as pariacoto virus [lol|. 



the viral capsid forces some fraction of RNA molecules to 
adopt it dodecahedron structure. In that case, the theory 
presented below should be applied to the free fraction of 
these RNAs.) Even though RNA-capsid interaction only 
occurs at the surface, RNA radial concentration profiles 
and the amount of RNA packaged inside a virus can be 
dictated by the strength of this interaction. The main re- 
sult of this papers is that there are two different profiles 
for the radial RNA nucleotide concentration. For small 
capsid attraction, the RNA concentration is maximum 
at the center of the capsid. A representative virus (the 
Dengue virus) for this profile is shown in Fig. [T^. For 
larger capsid attraction, the RNA concentration is maxi- 
mum at a distance close to (but always smaller than) the 
inner capsid radius. A representative virus (the bacte- 
riophage MS2) for this profile is shown in Fig. [TJd. For 
the later case, the RNA molecules form a dense layer 
at the inner capsid surface. The thickness of this layer 
varies very slowly (logarithmic) with the capsid radius. 
As a result, the amount of RNA packaged inside such 
viruses is proportional to the capsid area (or the number 
of capsid proteins) instead of its volume. 

It is well known that ssRNA molecules fold on them- 
selves due to base-pairing interaction between their 
nucleotides. Because nucleotide sequence of ssRNA 
molecules is not perfect for such pairing, their secondary 
structure is highly nonlinear. To the first approxima- 
tion, RNA molecules are considered to be highly flexi- 
ble branch polymers which can fluctuate freely over all 
possible branching configurations. Different branching 
configurations are described in the schematic way shown 
in Fig. [21 characterized by fugacities for "bi-functional" 
units (linear sequences), "tri- functional units" (branch- 
ing points) and "endpoints" (stem- loops or hair-pins). 
We assume good solvent condition with repulsive interac- 
tions between the different units (with no "tertiary" pair- 
ing). Using a mean-field approximation 1J| to a field the- 
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FIG. 1: Two difFerent profiles for RNA monomer concentra- 
tion inside spiierical viruses. Points are experimental data 
and solid lines are theoretical fit. a) Profile II, Eq. ([S}, fitted 
to RNA concentration of Dengue virus obtained from cryo- 
electron microscopy experiment [ij. b) Profile III, Eq. 
fitted to RNA concentration of bacteriophage MS2 obtained 
from small angle neutron scattering experiment [l^ . 




FIG. 2: Schematic representation of the secondary structure 
of a single-stranded RNA molecule as a collection of linear 
sections, branch-points, and end-points. The molecule can 
freely fluctuate between different branching configurations. 



ory for solutions of branching polymers of this type [13], 
one can write down an expression for the free energy den- 
sity of RNA solution W[Q{f)] as 



3M = |GW^->T- 

m Z D 



^muQ{r)^-hQ{r} , (1) 



where e, w, h and m are the fugacity of the monomers, 
branch points, the end-points and the whole polymers 
respectively. The coefficient u is proportional to the 
second-order virial coefficient for monomer-monomer in- 
teraction (since RNA molecules are assumed to be in 
good solvent, u is positive). Q(f) is the order param- 
eter of the field theory and is proportional to the con- 
centration of end-points. Note that if one sets w = 
(the branching degree of freedom is suppressed), Eq. ([1]) 
recovers the well known expression for the free energy 
density of a solution of linear polymers [l5|. Based on 
this mean-field expression, it is suggested that RNA are 



prone to a surface condensation which is different from 
that of linear Dolvmer|14l|. In this paper, we will use 
the mean-field expression, Eq. ([T|), to study how RNA 
molecules are packaged inside a virus. For simplicity, we 
model the viral capsid as a hollow sphere with inner ra- 
dius R. We also assume that RNA molecules are radially 
distributed inside the capsid so that Q{r) = Q{r) where 
r is the radial distance from the center of viral capsid. 
As a result, the excess free energy of the RNA molecules 
packaged inside a capsid can be written as 



Anr'^dr ( — ( 

dr 



AW 



(2) 



with AW[Q{r)] = W[Q{r)] ~ W[Qbuik]- The first term 
in Eq. ^ denotes the interaction energy of the capsid 
proteins with the RNA molecules. Assuming this inter- 
action occurs only at the inner capsid surface, Hs can be 
written as the sum of contributions from monomers and 
endpoint adsorptions: 



Hs - 47ri?2m[~7iQ(i?) - 72Q(i?)V2], 



(3) 



where 71^2 are the strengths of the adsorption. 

Due to the cubic term proportional to w in Eq. (fT|). 
for small positive e, the free energy density W{Q) has 
two minima, Qd and Qc, corresponding to, respectively, 
the mean-field order parameter of a dilute bulk RNA 
solution and that of a condensed bulk RNA solution. 
A first-order condensation transition takes place when 
W{Qd) = W{Qc)- We will always assume RNA solu- 
tion lies at this coexistence regime so that both the dilute 
and dense phases of RNA solution are close in energy. 
Therefore, we set bulk value Qbuik = Qd- The equi- 
librium RNA concentration profile corresponds to the 
profile Q{r) that minimizes the Hamiltionian Eq. ([2|). 
Setting the functional derivative, 6Hmf/SQ to zero, we 
obtain the Euler-Lagrangian equation 



d^Q 2dQ 1 dAW 
dr^ r dr m dQ 



0, 



(4) 



and a boundary condition at the inner capsid surface: 



dQ 
dr 



r=R. 



47ri?2 



m 



-71 - i2Q[R) 



(5) 



To proceed further, we approximate AW using the 
double parabolic potential form 17[ '■ 



\W(n\ - / h'^^oiQ ~ Qd? for Q<Qr 



(6) 



where Qm = [^dQd + ^cQc)/{^d + Ac) is the point 
where the two parabolas cross each other forming a cusp. 
The two coefficients A|) , are the stiffness of the free en- 
ergy density of RNA solution near the two minima. They 
are proportional to the corresponding correlation lengths 
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of the two phases. In general, this double parabolic po- 
tential form for the free energy density breaks down near 
the critical temperature where the first order transition 
becomes second order, or when the fugacity of branch 
points, w, goes to (the branching degree of freedom is 
suppressed and RNA molecules behave as a linear poly- 
mer). However, it was shown [TJ] that the mean- field 
expression, Eq. ([1]), breaks down before this limit is ap- 
proached. If one stays within the limit of mean-field the- 
ory, the double parabola approximation is a reasonable 
approximation. We will come back to its limitation in 
later discussion. With this approximate form of AW, 
Eq. Q becomes linear and easy to solve. The gen- 
eral solution is a linear combination of exp(±A£)^cr)/r. 
There are three possible concentration profiles for the 
RNA molecules. 

Profile I. If for all r, Q{r) < Q„i, then the solution to 
the Euler equation is 



where 



C 
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g(r) = -CiosmhiXor) /(XDr) + Qd 



(71 + 12Qd)R 

coshiXoR) + {12R - 1) sinh(Az5i?)/(A£,i?) ' 



(7) 



(8) 



Because the interaction of the RNA monomers with the 
viral capsid is attractive, 71^2 > 0, the coefficient Cio is a 
positive quantity. According to Eq. (O, this means that 
for all r, the endpoint (and monomer) concentration in 
this profile is always smaller than the bulk value, Q{r) < 
Qd = Qbuik- This is a non physical situation. Therefore, 
we discard this solution from later consideration. 

Profile II. The second possibility is the case that for 
all r, Q > Qm- Accordingly, the solution is 



where 
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Q{r) = -C2QSmh{^cr)/{Xcr)+Qc 



(71 + i2Qc)R 

cosh(Aci?) + {12R - 1) sinh(Ac-R)/(Aci?) 



(9) 



This solution is a monotonously decreasing function of 
r and the RNA concentration is maximum at the center 
of the capsid. Because of the requirement that Q{R) 
must be greater than Q™, this profile is possible only 
for very weak adsorption (in practice, XcR ^ 1, this 
requirement means [pixIQc + l2)/^D < !)• As a result, 
RNA monomers want to concentrate at the center of the 
capsid to gain their configurational entropy (minimizing 
the gradient term in Eq. ^ ). 

Profile III. The third possibility is that Q{r) passes 
through Qm at some distant r = rg (0 < tq < ii) such 
that Qir = rt)) = Qm- We can interpret as the bound- 
ary between the dilute and the condensed phases of RNA 
molecules inside the capsid. Requiring the density profile 



Q{r) and its derivative Q (r) to be continuous at ro, we 

get 

^..iiQo-Qnr-^^^ + QD forr<ro 
^[ ) S c3^£iPM + C322^E(^ + Qc forro <r<i? 



\cr 



where Qq — Q{Q) and 



(11) 



C31 = -exp[-(Ac + Ai3)ro](Oo-QD) (Ac/Ad -l)/4 
+ exp[^{Xc - XojroKQo - Qd){^c/>^d + l)/4 

- exp(-Acro)(Qc - Qd)(Ac?-o + l)/2, 
C32 - exp[(Ac. + A,3)ro](go-QD)(Ac/AD-l)/4 

- exp[(Ac - XD)ro\{QQ - Qd){\c/\d + l)/4 

- exp(Acro)(gc - Qz5)(Acro - l)/2. 

tq and Qq are two unknowns in the solution above. They 
can be solved by matching the boundary condition, Eq. 
([5]), and the condition (5(ro) — Qm- The later condition 
gives 



Qo = Qd + {Qc - Qd) 



Ac 



XdTq 



Ac + Ad sinh(A_Dro) 



(12) 



Substituting Eq. (Ill|) and p2|) into the boundary condi- 
tion Eq. (O, we arrive at the equation for tq: 



72 1 

1 + — 

Ac XcR 



-2XsRu 
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a;^ 



Aci? 



Xprp exp(-ADro) ^ ^ ^ Xp 
sinh(Ai3ro) Ac 

Aj3roexp(Az3ro) ^ Xp 
sinh(A£)?'o) Ac 



= 0, 
(13) 



where u — exp[Ac(J^ — r^)] . The parameter 

As = (1 + Ad/Ac)(7i + 72Qc)/(Qc - Qd) , (14) 

is proportional to the strength of RNA adsorption at the 
inner capsid surface and has dimension of inverse length. 
Obtaining an analytical solution for ro from Eq. (fT3|) is 
a highly non-trivial task and numerical solution is gener- 
ally needed. Nevertheless, we can understand important 
qualitative features of the RNA concentration profile by 
solving for ro in the limit of strong capsid RNA adsorp- 
tion [XgR ^ 1) and small correlation length of RNA 
concentrated phase [XcR ^ 1). In this limit, the first 
two terms in Eq. (|13p are the two most dominant ones. 
Balancing them, we get u ~ 2XsR, or 



R- Xc'^\n{2XsR). 



(15) 



As we mentioned above, ro can be considered as the 
boundary between a dense RNA phase near the capsid 
and a dilute RNA phase at the capsid center. The quan- 
tity d = i?— ro, therefore, can be considered the thickness 
of this dense RNA layer. According Eq. ([T5)) . d oclnR 
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which is parametrically smaller than the capsid radius, 
R pjj. In other words, our RNA concentration profile 
shows a dense RNA layer condensed on the inner cap- 
sid with thickness which varies very slowly with its ra- 
dius. Consequently, the amount of RNA packaged inside 
the virus is proportional to the capsid area (or the num- 
ber of capsid proteins) instead of its volume. In recent 
works [1, 01 , a similar dependence is observed when posi- 
tively charged amino acids of capsid proteins are located 
in their long flexible peptide arms. In their works, the 
thickness of RNA molecules (treated as linear polymers) 
layer depends on the length of these arms. On the other 
hand, for the class of viruses we study in this paper where 
the basic amino acids are located at the inner capsid sur- 
face instead of peptide arms, the competition between the 
branching degree of freedom of the secondary structure 
of RNA molecules and the attraction of capsid proteins 
is responsible for the layer structure and the thickness 
scales as In i?. Another interesting feature of RNA con- 
centration profile III is the fact that it does not peak 
at the inner capsid radius R but at some smaller radius. 
This is the direct consequence of the boundary condition, 
Eq. ([5]) which forces the RNA concentration to decrease 
in the vicinity of the capsid. 

In Fig. [U we plot examples of the two profiles, Eq. 
© and Eq. pT|) . fitted to the experimental data for 
two viruses, the Dengue virus and bacteriophage MS2. 
The data for the Dengue virus was obtained using cry- 
oelectron microscopy The data for bacteriophage 

MS2 was obtained using small angle neutron scattering 
measurements fl2j. Both viruses have most of their ba- 
sic amino acids located on the surface of inner capsid, 
therefore our model capsid can be used. Both theoreti- 
cal profiles show reasonable agreement with experiment 
results. 

So far, when solving the Euler-Lagrange equation for 
RNA density profile, we assume Q{r) crosses the value 
Qm at most one time. Certainly, there is a possibility 
that Q{r) can cross Qm multiple times as r increases 
from zero to R. This results in an oscilating RNA concen- 
tration profile. One could easily extend our calculation 
presented in this paper to such a case by adding more 
piecewise solution to the ansatz, Eq. and requiring 

Q{r) and its derivative to be continuous at the crossing 
points. Such extension could offer insights, for e.g., into 
the oscillating radial profile of RNA molecules packaged 
inside Turnip Yellow Mosaic Virus (TYMV) JJ]. Never- 
theless, these cases are relatively uncommon and the cal- 
culations would go beyond the scope of this letter. We 
will address these cases in more detail in future study. 

Naturally, one wants to know which RNA concentra- 
tion profile is the most thermodynamically stable. To 
answer this question, one needs to substitute these pro- 
files (Eq. ^ and Eq. (fTTjl ) into the original expression 
for the capsid excess free energy, Eq. and compare 
the resulting energies. This is a tedious task. Numer- 



ically, it is found that for small adsorption strength of 
viral capsid, the second profile would be thermodynami- 
cally stable and RNA concentration is maximum at the 
capsid center. For stronger surface adsorption, the third 
profile is lower in energy. In this case, RNA molecules 
form a dense layer at the capsid and the RNA concen- 
tration is maximum at a finite radius smaller than R. 

It is known [3] that the mean- field theory, Eq. 
breaks down when the critical point is approached and 
the first order transition between dilute and condensed 
phases of RNA solution becomes of second order. Once 
this happens, a physical picture similar to that of a solu- 
tion of branched polymer with frozen branching arrange- 
ment emerges In this case, the RNA molecules be- 
come unscreened and non-overlapped. For viruses with 
several packaged RNA molecules, each of them would ad- 
sorb independently onto the capsid and the layer thick- 
ness of each molecule scales as square root of its molec- 
ular weight. Conversely, if such separation between con- 
stituent viral genomes is observed, it would signal the 
breakdown of mean-field theory. 

In conclusion, in this paper we found two different nu- 
cleotide concentration profiles of viral RNA molecules 
packaged inside spherical viruses. The theory applies to 
a class of viruses where capsid-RNA interaction occurs at 
the capsid surface only. For small interaction strength, 
the RNA monomer concentration is maximized at the 
center of the capsid to maximize their configurational 
entropy. For higher interaction strength, RNA forms a 
dense layer near the capsid surface. The thickness of this 
layer is a slowly varying (logarithmic) function of the in- 
ner capsid radius. In this case, the amount of packaged 
RNA would be proportional to capsid area (or number of 
capsid proteins) instead of its volume. The profiles de- 
scribe reasonably well the experimental profiles for vari- 
ous viruses. 
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